Time to compare cost of living around the world.
Our dataset will be used from website https://www.numbeo.com/cost-of-living/
Let's read collected data
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import folium
import geopandas as gpd
from IPython.display import display, HTML, IFrame
import requests
file = r'C:\Users\rocze\Desktop\pythonProject\COL.json'
data = pd.read_json(file)
print(data)
CountryName Meal, Inexpensive Restaurant \
0 Afghanistan 1.67 $
1 Aland Islands 13.09 $
2 Albania 5.51 $
3 Alderney 22.32 $
4 Algeria 2.19 $
.. ... ...
229 Vietnam 2.12 $
230 Western Sahara 3.20 $
231 Yemen 5.99 $
232 Zambia 3.67 $
233 Zimbabwe 6.00 $
Meal for 2 People, Mid-range Restaurant, Three-course \
0 7.78 $
1 64.10 $
2 27.57 $
3 72.38 $
4 14.63 $
.. ...
229 21.19 $
230 26.71 $
231 19.97 $
232 29.40 $
233 40.00 $
McMeal at McDonalds (or Equivalent Combo Meal) Cappuccino (regular) \
0 3.08 $ 1.06 $
1 10.68 $ 3.74 $
2 5.51 $ 1.31 $
3 9.65 $ 3.52 $
4 3.66 $ 0.61 $
.. ... ...
229 4.66 $ 1.84 $
230 5.34 $ 1.07 $
231 9.99 $ 3.18 $
232 3.41 $ 1.75 $
233 6.00 $ 2.21 $
Coke/Pepsi (0.33 liter bottle) Water (0.33 liter bottle) \
0 0.35 $ 0.16 $
1 2.24 $ 1.74 $
2 1.32 $ 0.63 $
3 1.99 $ 1.01 $
4 0.48 $ 0.20 $
.. ... ...
229 0.53 $ 0.32 $
230 0.53 $ 0.32 $
231 1.37 $ 0.67 $
232 0.83 $ 0.46 $
233 0.65 $ 0.47 $
Milk (regular), (1 liter) Loaf of Fresh White Bread (500g) \
0 0.58 $ 0.23 $
1 1.10 $ 1.87 $
2 1.35 $ 0.72 $
3 1.81 $ 2.41 $
4 0.78 $ 0.16 $
.. ... ...
229 1.48 $ 1.02 $
230 0.75 $ 0.53 $
231 2.26 $ 1.25 $
232 1.19 $ 0.98 $
233 1.58 $ 1.00 $
Rice (white), (1kg) ... 1 Pair of Nike Running Shoes (Mid-Range) \
0 1.20 $ ... 29.64 $
1 2.07 $ ... 117.51 $
2 1.42 $ ... 74.80 $
3 ? ... ?
4 1.09 $ ... 65.82 $
.. ... ... ...
229 0.97 $ ... 93.32 $
230 1.71 $ ... 69.44 $
231 4.00 $ ... 74.23 $
232 1.83 $ ... 90.18 $
233 1.49 $ ... 39.44 $
1 Pair of Men Leather Business Shoes Apartment (1 bedroom) in City Centre \
0 34.27 $ 93.89 $
1 131.76 $ 747.82 $
2 86.88 $ 353.76 $
3 ? ?
4 75.38 $ 167.91 $
.. ... ...
229 93.98 $ 377.45 $
230 74.78 $ 320.50 $
231 89.56 $ 265.13 $
232 105.64 $ 274.41 $
233 41.11 $ 282.35 $
Apartment (1 bedroom) Outside of Centre \
0 55.57 $
1 560.87 $
2 221.69 $
3 ?
4 106.18 $
.. ...
229 235.36 $
230 160.25 $
231 163.62 $
232 193.01 $
233 260.83 $
Apartment (3 bedrooms) in City Centre \
0 226.32 $
1 1,228.57 $
2 610.71 $
3 ?
4 282.35 $
.. ...
229 859.01 $
230 640.99 $
231 540.18 $
232 924.31 $
233 828.57 $
Apartment (3 bedrooms) Outside of Centre \
0 137.82 $
1 988.19 $
2 363.63 $
3 965.10 $
4 185.38 $
.. ...
229 492.50 $
230 267.08 $
231 403.36 $
232 438.29 $
233 661.54 $
Price per Square Meter to Buy Apartment in City Centre \
0 614.26 $
1 2,848.85 $
2 1,747.62 $
3 ?
4 1,047.51 $
.. ...
229 2,754.98 $
230 598.26 $
231 1,304.22 $
232 10,763.91 $
233 1,069.10 $
Price per Square Meter to Buy Apartment Outside of Centre \
0 251.91 $
1 2,029.80 $
2 900.76 $
3 ?
4 626.80 $
.. ...
229 1,465.91 $
230 491.43 $
231 789.14 $
232 39,211.94 $
233 1,266.67 $
Average Monthly Net Salary (After Tax) \
0 181.99 $
1 2,599.57 $
2 420.96 $
3 ?
4 263.73 $
.. ...
229 445.56 $
230 213.66 $
231 378.65 $
232 396.94 $
233 295.45 $
Mortgage Interest Rate in Percentages (%), Yearly, for 20 Years Fixed-Rate
0 13.36
1 1.95
2 4.73
3 ?
4 6.57
.. ...
229 9.98
230 ?
231 11.25
232 23.25
233 15.00
[234 rows x 50 columns]
That's a lot of data, we should focus of few columns.
selectedData = data[['CountryName','Coke/Pepsi (0.33 liter bottle)','Water (0.33 liter bottle)','Rice (white), (1kg)','1 Pair of Nike Running Shoes (Mid-Range)','Apartment (1 bedroom) in City Centre','Apartment (3 bedrooms) in City Centre','Price per Square Meter to Buy Apartment in City Centre','Average Monthly Net Salary (After Tax)']]
print(selectedData)
CountryName Coke/Pepsi (0.33 liter bottle) Water (0.33 liter bottle) \
0 Afghanistan 0.35 $ 0.16 $
1 Aland Islands 2.24 $ 1.74 $
2 Albania 1.32 $ 0.63 $
3 Alderney 1.99 $ 1.01 $
4 Algeria 0.48 $ 0.20 $
.. ... ... ...
229 Vietnam 0.53 $ 0.32 $
230 Western Sahara 0.53 $ 0.32 $
231 Yemen 1.37 $ 0.67 $
232 Zambia 0.83 $ 0.46 $
233 Zimbabwe 0.65 $ 0.47 $
Rice (white), (1kg) 1 Pair of Nike Running Shoes (Mid-Range) \
0 1.20 $ 29.64 $
1 2.07 $ 117.51 $
2 1.42 $ 74.80 $
3 ? ?
4 1.09 $ 65.82 $
.. ... ...
229 0.97 $ 93.32 $
230 1.71 $ 69.44 $
231 4.00 $ 74.23 $
232 1.83 $ 90.18 $
233 1.49 $ 39.44 $
Apartment (1 bedroom) in City Centre \
0 93.89 $
1 747.82 $
2 353.76 $
3 ?
4 167.91 $
.. ...
229 377.45 $
230 320.50 $
231 265.13 $
232 274.41 $
233 282.35 $
Apartment (3 bedrooms) in City Centre \
0 226.32 $
1 1,228.57 $
2 610.71 $
3 ?
4 282.35 $
.. ...
229 859.01 $
230 640.99 $
231 540.18 $
232 924.31 $
233 828.57 $
Price per Square Meter to Buy Apartment in City Centre \
0 614.26 $
1 2,848.85 $
2 1,747.62 $
3 ?
4 1,047.51 $
.. ...
229 2,754.98 $
230 598.26 $
231 1,304.22 $
232 10,763.91 $
233 1,069.10 $
Average Monthly Net Salary (After Tax)
0 181.99 $
1 2,599.57 $
2 420.96 $
3 ?
4 263.73 $
.. ...
229 445.56 $
230 213.66 $
231 378.65 $
232 396.94 $
233 295.45 $
[234 rows x 9 columns]
But let's first clear our data, we have to remove some things from our table
selectedData = selectedData.applymap(lambda x: x.replace('\u00a0$', '') if isinstance(x, str) else x)
selectedData = selectedData.applymap(lambda x: x.replace(',', '') if isinstance(x, str) else x)
selectedData.iloc[:, 1:] = selectedData.iloc[:, 1:].apply(lambda x: pd.to_numeric(x, errors='coerce'))
print(selectedData)
CountryName Coke/Pepsi (0.33 liter bottle) Water (0.33 liter bottle) \
0 Afghanistan 0.35 0.16
1 Aland Islands 2.24 1.74
2 Albania 1.32 0.63
3 Alderney 1.99 1.01
4 Algeria 0.48 0.2
.. ... ... ...
229 Vietnam 0.53 0.32
230 Western Sahara 0.53 0.32
231 Yemen 1.37 0.67
232 Zambia 0.83 0.46
233 Zimbabwe 0.65 0.47
Rice (white), (1kg) 1 Pair of Nike Running Shoes (Mid-Range) \
0 1.2 29.64
1 2.07 117.51
2 1.42 74.8
3 NaN NaN
4 1.09 65.82
.. ... ...
229 0.97 93.32
230 1.71 69.44
231 4.0 74.23
232 1.83 90.18
233 1.49 39.44
Apartment (1 bedroom) in City Centre \
0 93.89
1 747.82
2 353.76
3 NaN
4 167.91
.. ...
229 377.45
230 320.5
231 265.13
232 274.41
233 282.35
Apartment (3 bedrooms) in City Centre \
0 226.32
1 1228.57
2 610.71
3 NaN
4 282.35
.. ...
229 859.01
230 640.99
231 540.18
232 924.31
233 828.57
Price per Square Meter to Buy Apartment in City Centre \
0 614.26
1 2848.85
2 1747.62
3 NaN
4 1047.51
.. ...
229 2754.98
230 598.26
231 1304.22
232 10763.91
233 1069.1
Average Monthly Net Salary (After Tax)
0 181.99
1 2599.57
2 420.96
3 NaN
4 263.73
.. ...
229 445.56
230 213.66
231 378.65
232 396.94
233 295.45
[234 rows x 9 columns]
Therefore we need to change the type of columns from object to numerical
selectedData['Coke/Pepsi (0.33 liter bottle)'] = selectedData['Coke/Pepsi (0.33 liter bottle)'].astype(float)
selectedData['Water (0.33 liter bottle)'] = selectedData['Water (0.33 liter bottle)'].astype(float)
selectedData['Rice (white), (1kg)'] = selectedData['Rice (white), (1kg)'].astype(float)
selectedData['1 Pair of Nike Running Shoes (Mid-Range)'] = selectedData['1 Pair of Nike Running Shoes (Mid-Range)'].astype(float)
selectedData['Apartment (1 bedroom) in City Centre'] = selectedData['Apartment (1 bedroom) in City Centre'].astype(float)
selectedData['Apartment (3 bedrooms) in City Centre'] = selectedData['Apartment (3 bedrooms) in City Centre'].astype(float)
selectedData['Price per Square Meter to Buy Apartment in City Centre'] = selectedData['Price per Square Meter to Buy Apartment in City Centre'].astype(float)
selectedData['Average Monthly Net Salary (After Tax)'] = selectedData['Average Monthly Net Salary (After Tax)'].astype(float)
selectedData.dtypes
CountryName object Coke/Pepsi (0.33 liter bottle) float64 Water (0.33 liter bottle) float64 Rice (white), (1kg) float64 1 Pair of Nike Running Shoes (Mid-Range) float64 Apartment (1 bedroom) in City Centre float64 Apartment (3 bedrooms) in City Centre float64 Price per Square Meter to Buy Apartment in City Centre float64 Average Monthly Net Salary (After Tax) float64 dtype: object
Our columns names are little to long.
selectedData.columns=['Country','Coke 0.33l','Water 0.33l','Rice 1kg','Nike Runners','1B Apartment','3B Apartment','1SM Price in Center','Avg Salary']
print(selectedData)
Country Coke 0.33l Water 0.33l Rice 1kg Nike Runners \
0 Afghanistan 0.35 0.16 1.20 29.64
1 Aland Islands 2.24 1.74 2.07 117.51
2 Albania 1.32 0.63 1.42 74.80
3 Alderney 1.99 1.01 NaN NaN
4 Algeria 0.48 0.20 1.09 65.82
.. ... ... ... ... ...
229 Vietnam 0.53 0.32 0.97 93.32
230 Western Sahara 0.53 0.32 1.71 69.44
231 Yemen 1.37 0.67 4.00 74.23
232 Zambia 0.83 0.46 1.83 90.18
233 Zimbabwe 0.65 0.47 1.49 39.44
1B Apartment 3B Apartment 1SM Price in Center Avg Salary
0 93.89 226.32 614.26 181.99
1 747.82 1228.57 2848.85 2599.57
2 353.76 610.71 1747.62 420.96
3 NaN NaN NaN NaN
4 167.91 282.35 1047.51 263.73
.. ... ... ... ...
229 377.45 859.01 2754.98 445.56
230 320.50 640.99 598.26 213.66
231 265.13 540.18 1304.22 378.65
232 274.41 924.31 10763.91 396.94
233 282.35 828.57 1069.10 295.45
[234 rows x 9 columns]
Something wrong is with USA and other names, let's fix it!
row = selectedData.loc[selectedData['Country'] == "United States"]
selectedData.at[row.index[0], 'Country'] = 'United States of America'
row2 = selectedData.loc[selectedData['Country'] == "Czech Republic"]
selectedData.at[row2.index[0], 'Country'] = 'Czechia'
row3 = selectedData.loc[selectedData['Country'] == "Kosovo (Disputed Territory)"]
selectedData.at[row3.index[0], 'Country'] = 'Kosovo'
row4 = selectedData.loc[selectedData['Country'] == "Republic Of Congo"]
selectedData.at[row4.index[0], 'Country'] = 'Dem. Rep. Congo'
row5 = selectedData.loc[selectedData['Country'] == "South Sudan"]
selectedData.at[row5.index[0], 'Country'] = 'S. Sudan'
row6 = selectedData.loc[selectedData['Country'] == "Bosnia And Herzegovina"]
selectedData.at[row6.index[0], 'Country'] = 'Bosnia and Herz.'
For our visualization we need world map
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
world.to_file("world-countries.json", driver="GeoJSON")
map=r"C:\Users\rocze\world-countries.json"
m = folium.Map(location=[48, -102], zoom_start=3)
folium.Choropleth(
geo_data=map,
data=selectedData,
columns=["Country", "Avg Salary"],
key_on="feature.properties.name",
fill_color="YlGn",
fill_opacity=0.7,
line_opacity=0.2,
legend_name="Average Salary (USD)",
reset=True
).add_to(m)
m.save("avg_salary_map.html")
Here is the map we created: https://github.com/Alcosmao/CostOfLifeAroundTheWorld/blob/main/avg_salary_map.html
But let's create also non interactiv map
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
world = world[['name', 'geometry']]
merged = world.merge(selectedData, left_on='name', right_on='Country', how='left')
merged['range'] = pd.cut(merged['Avg Salary'], bins=11)
fig, ax = plt.subplots(figsize=(14, 10))
merged.plot(column='range', cmap='Greens', linewidth=0.8, ax=ax, edgecolor='0.8', legend=True,legend_kwds={"loc": "lower left", "title": "Average Salary in USD p/m", "frameon": False})
ax.set_title('Average $ monthly salary by Country')
ax.set_axis_off()
plt.show()
Time to show top 5 bottom and top countries
selectedData = selectedData.dropna(subset=['Avg Salary'])
sortedData = selectedData.sort_values(by='Avg Salary')
topFive = sortedData.tail(5)
bottomFive = sortedData.head(5)
concatData = pd.concat([topFive, bottomFive])
ax = concatData.plot(kind='bar', x='Country', y='Avg Salary', color=['red', 'blue', 'green', 'orange', 'purple', 'purple', 'orange', 'green', 'blue', 'red'])
for x in ax.containers:
ax.bar_label(x, label_type='edge')
plt.show()
We can hardly see the bottom values. Try another way
topFive = sortedData.tail(5)
bottomFive = sortedData.head(5)
plt.bar(topFive["Country"], topFive["Avg Salary"], color=['red', 'blue', 'green', 'orange', 'purple'])
for i, v in enumerate(topFive["Avg Salary"]):
plt.text(i, v, str(v), ha='center', va='bottom')
plt.title("Top 5 Countries by Average Salary")
plt.xlabel("Country")
plt.ylabel("Average Salary")
plt.figure()
plt.bar(bottomFive["Country"], bottomFive["Avg Salary"], color=['red', 'blue', 'green', 'orange', 'purple'])
for i, v in enumerate(bottomFive["Avg Salary"]):
plt.text(i, v, str(v), ha='center', va='bottom')
plt.title("Bottom 5 Countries by Average Salary")
plt.xlabel("Country")
plt.ylabel("Average Salary")
plt.show()
Okay, time to see how much of the kg rice we can by for our salary in each country
selectedData = selectedData.dropna(subset=['Rice 1kg', 'Avg Salary'])
selectedData['Rice per Salary'] = selectedData['Avg Salary'] / selectedData['Rice 1kg']
plt.scatter(selectedData['Avg Salary'], selectedData['Rice per Salary'])
plt.xlabel('Average Salary')
plt.ylabel('Rice per Salary')
plt.show()
I don't think we can see anything here! Let's do it more interactive!
fig = px.scatter(selectedData, x="Rice per Salary", y="Avg Salary", color="Country", hover_name="Country", size_max=70)
fig.update_layout(
title="Rice Affordability by Country",
xaxis_title="Rice per Salary (Kg)",
yaxis_title="Average Salary per Month (USD)",
legend_title="Country",
hoverlabel=dict(
bgcolor="black",
font_size=16,
font_family="Arial"
)
)
fig.show()
When You click on country from list or dot on chart all information will be displayed!
For the next chart, let's calculate how many years it would take to save enough money for a 72m2 flat, assuming that we can save 20% of our salary every month.
selectedData['YearsNeededFor72mFlat'] = np.ceil(0.2 * (72 * selectedData['1SM Price in Center']) / selectedData['Avg Salary'])
print(selectedData[['Country','YearsNeededFor72mFlat']])
Country YearsNeededFor72mFlat 0 Afghanistan 49.0 1 Aland Islands 16.0 2 Albania 60.0 4 Algeria 58.0 6 Andorra 24.0 .. ... ... 229 Vietnam 90.0 230 Western Sahara 41.0 231 Yemen 50.0 232 Zambia 391.0 233 Zimbabwe 53.0 [222 rows x 2 columns]
We will skip values higher than 60 years, because it's not possible then to person to live that long to buy a flat!
selectedData = selectedData[selectedData["YearsNeededFor72mFlat"] < 60]
fig = px.choropleth(selectedData, locations="Country", locationmode="country names",
color="YearsNeededFor72mFlat",
hover_name="Country",
color_continuous_scale=px.colors.sequential.Plasma,
range_color=(selectedData["YearsNeededFor72mFlat"].min(), selectedData["YearsNeededFor72mFlat"].max()),
labels={"YearsNeededFor72mFlat": "Needed Years"})
fig.update_layout(title="Map of Needed Years to Save for 72 m² Flat", title_x=0.5)
fig.show()
WORK IN PROGRESS